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1 Introduction 
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Let X,X 1 ,X 2 , ... be i.i.d. IR d (d > 1) valued random variables and assume that the 

common distribution function of these variables has a Lebesgue density function, which 

we shall denote by fx- A kernel K will be any measurable function which satisfies the 

\Q ■ following conditions: 

O 

^ ■ (K.i) I K(s)ds = 1, and 

03 

s 

> 
X 



IR 



(K.ii) Halloo := SU P = ^ < oo. 

The kernel density estimator of f x based upon the sample X 1; ...,X n and bandwidth 



< h< 1 is 

It is well known that if one chooses a suitable bandwidth sequence h n — >• and the den- 
sity fx is continuous, one obtains a strongly consistent estimator f n := f nthn of fx, i.e. 
one has with probability 1, f n {x) — > fx(x),x G IR d . It is also natural to investigate other 
modes of convergence, for instance uniform convergence and to ask what convergence 
rates are feasible. 

For proving such results, one usually writes the difference f n (x) — fx{%) a s the sum 
of a probabilistic term f n (x) — JEf n (x) and a deterministic term JEf n (x) — fx{ x )i me so- 
called bias. The order of the bias depends on smoothness properties of fx only, whereas 
the first (random) term can be studied via empirical process techniques as has been pointed 
out by Q, (H, C3 and 11211 . among other authors. 



|0] (see also |0] for the 1 -dimensional case) have shown that if K is a "regular" kernel, 
the density function f x is bounded and h n satisfies the regularity conditions h n \ 0, 
h n /h 2n is bounded, and 

log(l//i n )/ log log n — > oo and nh n / logri — > oo, 

one has with probability 1, 



||/n - Wnlloc = O VI l0g/in|Mn , d-D 



where 1 1 • | |oo denotes the supremum norm on IR d . Moreover, this rate cannot be improved. 
Interestingly one does not need continuity of fx for this result. (Continuity of f x is of 
course needed for controlling the bias.) Recently, [6] have provided a "uniform in h" 

version of this result, that is, they have proved that 

hmsup sup — — =: K[c) < oo. (1.2) 

rwoo £ ]ogn< /l < 1 /| \ogh n \ V loglogn 



This result implies that if one chooses the bandwidth depending on the data and/or the 
location x, as is usually done in practice, one has the same order of convergence as in the 
case of a deterministic bandwidth sequence. 

Now let Y, Yi, Y 2 , . . . be a sequence of r-dimensional random vectors (r > 1) so that 
the random vectors (X, Y), (Xi, Yi), . . . are i.i.d. with common joint Lebesgue density 
function /. In this case it is also of great interest to estimate IE [^(F)|X = x), where 
ip : W — > ]R is a suitable mapping. A possible kernel type estimator which reduces to 
the classical Nadaraya- Watson estimator if r = 1, ip(y) = y, is given by 

Likewise by setting in the 1-dimensional case for t 6 1R, ip t (y) = I]-oo,t](y),y £ JR, we 
obtain the kernel estimator of the conditional empirical function 

F(t\x) := 1P{Y < t\X = x} 

given by 

ft \ _ n^m^mttx-xj/K) 
n[lh Tumx-x^/K) ■ 

This kernel estimator is called the conditional empirical distribution function and was 



first extensively studied by 111711 . Exact convergence rates uniformly on compact subsets 
of IR d have been obtained for both Nadaraya- Watson type estimators as in (jl.3|) and the 
conditional empirical distribution function by |0] in the case of deterministic bandwidth 
sequences. Recently, [6] have established uniform in bandwidth results for these esti- 
mators which are of a similar type as result (ll.2|) . The proof of these results requires 
establishing a suitable version of a result of type (II .2|) for processes of the form 

'x - Xi 



(p(Y)K 



h i/d 



where x E I (I a compact subset of ]R d or I = lR d ) and tp e $, where $ is a suitable 
class of functions. 



For certain applications, however, this class of processes could be too small. One of 
the purposes of this paper is to establish such uniform in bandwidth consistency results 
for a larger class of processes. As an application of our results, we shall prove uniform 
in bandwidth consistency of local polynomial regression estimators. Such estimators are 
generalizations of the classic Nadaray a- Watson estimator (see, especially, 0] and 119]). 
In Section 2 we will state two general consistency results, one of which will be proved 
in Section 3. In Section 4 we treat the local polynomial regression estimators. In an 
appendix we gather together some facts needed in our proofs. 



2 General consistency results 

We shall begin by stating a result proved in Jg], which will be instrumental in establishing 
uniform in bandwidth consistency of local polynomial regression function estimators. Let 
$ denote a class of measurable functions on ]R r with a finite valued measurable envelope 
function F, 

F(y)>snv J \<p(y)\,yeM r . (2.1) 

Further assume that $ is pointwise measurable and satisfies (A. 2) in the Appendix with 
Q replaced by $. (For the definition of pointwise measurable also refer to the Appendix.) 
Consider the following class of functions 

K = [K{{x - -)/h 1/d ) : h > 0, x G H d } , (2.2) 

and assume that K, is pointwise measurable and satisfies (A. 2) with Q replaced by fC. 
Introduce the class of continuous functions on a compact subset J of IR d indexed by $: 

C:= {c v :<^e $}. 

We shall always assume that the class C is relatively compact with respect to the sup-norm 
topology, which by the Arzela-Ascoli theorem is equivalent to being uniformly bounded 
and uniformly equicontinuous. 

For any cp 6 $ and continuous functions c v on a compact subset J of IR/ 4 , set for 

x e J, 

n (x-X,C 



id 



where K is a kernel with support contained in [—1/2, 1/2] such that (K.i) and (K.ii) 
hold. The following result was proved in [Q], where it is stated as Proposition 2. (||-|| 7 
denotes the supremum norm on /.) 



Theorem 1. Let I be a compact subset o/IR such that J = I 11 , for some < 77 < 1. Also 
assume that 

f is continuous and strictly positive on J. (2.3) 
Further assume that the envelope function F of the class $ satisfies 

3M > : F(Y)1 {X E J} < M, a.s. (2.4) 

or for some p > 2 

a := supE[F p (F)|X = z] < 00. (2.5) 

zeJ 

Then we have for any c > and < h < (2r]) d , with probability 1, 

,n,h r „ / » 

hmsup sup — y =: Q(c) < 00, (2.6) 

n-*oo c (logn/np<h<h J nil ( | log h\ V log log n) 

where 7 = 1 in the bounded case (|2.4j) an J 7 = 1 — 2/p under assumption f!2.5|) . 

The next result generalizes Theorem 1 in the bounded case. Its proof is illustrative 
of how that of Theorem 1 goes using an empirical process approach based upon an in- 
equality of Talagrand coupled with a moment bound for the supremum of the empirical 
process. These basic tools are stated in the Appendix. 

In the following, || ■ denotes the supremum norm on ]R d or ]R d+r , whichever 
is appropriate. Let Q denote a class of measurable real valued functions g of (u, t) E 
]R d xW = M d+r . We shall assume that Q satisfies: 

(G.i) sup 9gg I \g\ \ 00 =:k< 00; 
(G.ii) sup ffg g J^d+r g 2 (x, y)dxdy =: L < 00. 
Denote by Tq, the class of functions of (s, t) E JR d+r formed from Q as follows: 

T Q = {g(z - sX, t) : A > 1, z E H d and g E 0] . 

We shall also assume that the class of functions Tq satisfies the following uniform entropy 
condition: 

(F.i) for some C > and u > 0, N(e, Tg) < C e^°, < e < 1. 

Finally, to avoid using outer probability measures in all of our statements, we impose the 
measurability assumption: 

(F.ii) Tq is a pointwise measurable class. 

(For the definitions of pointwise measurable and of N(e,J r g) see the Appendix below, 
where we use n as our envelope function.) 

For any g E Q and < h < 1 define, 

g n>h (x) := {nh)- 1 f j g(^^^ , x E JR d . 



i=l 



Theorem 2. Assuming (G.i), (G.ii), (F.i), (F.ii), and f (the joint density of (X,Y)) bounded, 
we have for c > and < h < 1, 



vnh \\g nh - W^gnhW 
hmsup sup sup — - =: G(c) < oo. (2.7) 

z lssjL< h <h y'j \ogh\ V loglogn 



Remark. Theorem 2 is still valid for r = 0. In this case, g : M d — > IR d and condition 
(G.ii) should be read as sup geg J R d g 2 (x)dx =: L < oo. 



3 Proof of Theorem 2 



Let a n be the empirical process based on the sample (X 1: Y{) , . . . , (X n , Y n ), i.e. if ip : 
JR d x ]R r -> IR, we have 

Notice that in this notation 

g n>h (x) - JEg„, h (x) = j-^=a n (g((x - ■)/h 1 / d , •)) , xE R d 
so we get that 



sup 



nh \\g njh - ~Eg. 



n,h 



sup sup 



\fnoL n (g(fj 



g^Q yj\ \ogh\ V log log n 9^Q xeu d ynh (| \ogh\ V log log n) 

where — ■)/h 1 / d , •) denotes the function (s,t) — > ^((x — s)/h l l d ,t). We first note 
that by (G.ii) and the assumption that 1 1/| |oo < 



IE 



9 



x-X 



x — s 



h Lh h ^Kl^** fMdadt 



Set for j > ando 0, 
and 



h jtn := (Vclogn) /n 



T hn = {g((x - -)/h 1/d , -):geG, h j>n <h< h j+1 , n , x G H d | . 
Clearly for h^ n < h < hj +1>n , 



IE 



Y 



< 2h 



j,n\\J ||oo 



,L =: D h jtn =: o] n . 



We shall use Proposition A.l in the Appendix to bound E|| Y™ =1 €i(p(X i: Yj) \\r jn . To 
that end we note that each jF j n satisfies (A.l) of the proposition with G = (3 = k and 
(A.3) with o- 2 = a] n . Further, since J~j :n C J~g, we see by (F.i) that each J~j, n also fulfills 



(A. 2). Finally (A. 4) holds for large enough n and all j > 0. Now by applying Proposition 
A.l we get for all large enough n and j > 0, 



E|| X; eMX h Yi)\\r jtn < D iy Jnh j>n |log {D 2 h j>n )\, (3.1) 



i=l 



for some D 1 > and D 2 > 0. Let for large enough n 

l n := max{j : /i i>n < 2/i } 
then a little calculation shows that 



log 2 

For k > 1, set = 2 fc , and let 



log (-2*0 

Z n ~ Vclog " ; . (3.2) 



Cj,fc := yn k h j)nk (\logD 2 h j;nk \ Vloglogn fc ), j > 0. 
Applying Inequality A.l in the Appendix with 

M = k and a 2 g = < D h jtnk , 

we get for any t > 0, 

P\ max HVnanltan t > ^(-Dic,-,* + t) \ 

< 2 exp (-A 2 t 2 / (D n k h jtnk )^ + exp(-A 2 t/ k) . 
Set for any p > 1, j > and k > 1, 

:= iP <^ max || Vna»||^ n . > (£>i + p) c iife \ . 



As we have Cj^/y Tikhj, nk — Vlog \°g n k, we readily obtain for j > 

PjAp) < 2 



exp (- ^-|-^loglogn fe J +exp [ - V ^ P ^ 2 w^og nfc l og l og nfc 



D j y i% 

which for 7 = ^ A implies 

Pj,fc(p) < 4 exp (- p7 log log . 

Thus 

which by (13 ,2b . for all large and large enough p > 1 

P k (p)< 8(logn fc ) 1 ^ = 8 f— J <fc- 2 . 



Notice that by definition of l n , for large k 

2hl„ k ,n k = h lnk +l,n k > 2/l 0) 

which implies that we have for n k ^ x < n < n k 

c\ogn k 



c log n 
,h 

n 



C 



Thus for all large enough k and n k _i < n < n k , 

Mp) ■■= 



vnh\\g nih - E^lloo 
max sup sup — , - > 2Ai{Di + p) 

n k -!<n<n k g& g cl^n< h < hQ J\ log | V log log U 



C |J { max \\Vna n \\f ■ > A 1 (D 1 + p)c jtk \ . 

It follows now for large enough p that 

P{A k (p)}<P k (p)<k- 2 , 
which by the Borel-Cantelli lemma implies our theorem. □ 

4 Application to local polynomial regression function es- 
timators 

In this section we shall always assume that the assumptions of Theorem 1 hold ( in par- 
ticular, that K has support contained in [—1/2, 1/2]) and I is a fixed compact interval in 
IR. We shall also assume that K > 0. 

4.1 Estimating the regression function by local polynomials 

Let (X, Y), (Xi, Yi), . . . , (X n) Y n ) be i.i.d. 2-dimensional random vectors and write 

g(x) :=-E[Y\\X = x} 

for the regression function. Suppose that g{x) is (p + 1) times differentiable on J = I v , 
then we can approximate g(x) locally around x e / by a polynomial of order p (Taylor): 

g(x) « g(x ) + g'(x )(x - x ) + . . . + : — (x - x ) p . 

p\ 

Then consider the weighted least-squares regression problem (WLS) 

1 n p /r — X \ 

argmin^ KP+1 — g[Y; - g - x ) j ] 2 K (- 9 -^— L ) ■ (4.1) 



i=o 



It is clear that if j3 G ]R P+1 is the solution of the WLS problem in (14.11 ). we obtain an 
estimator g~^ h (x ) of g(x ) by taking it be /3 , the first component of fi. At the same time 
we obtain estimators of the derivatives of the regression function up to order p. To solve 
(14. lb. first note that it can be written in a matrix notation: 



argming eiRp+1 (Y - X X0 Pf W xo (Y - X xo /3) 



(4.2) 



where Wa,,, = (nh)' 1 diag (k G R nxn , and X xo G R nx ^ p+1 \Y G R nxl and 

P G iR( p+1 ) xl are defined as 





(I (X 1 -x ) ■ 


■■ (x 1 -x y\ 








Y — 






Y: = 








\ 1 (X n - x Q ) ■ 


■■ (X n -x ) p J 









If we set 



nh 



( x Q - Xj 
V h 



i=l 3=0 

it is not too difficult to see that for k — 0, . . . , p, the partial derivatives can be written as 

dL(x ) 



d(3 k 



-2{Y-X xo pyW xo X xo e% 



where e k is the k-th unit vector in M p+1 . So by setting the partial derivatives equal to 
zero, we obtain that the solution of the WLS problem (14.11) must satisfy 



Y t W X0 X X0 =p t X t x W X0 X X0 . 



Assuming that 



S 



X' W X 



is invertible, we can compute the solution by 

P X0 = {X^W^)' 1 Xl o W Xo Y. 

We shall show that asymptotically the inverse matrix of S XQ always exists. To see this, 
consider for < j < 2p the functions 



u 



-u) j K(u). 



Since we assume K to be bounded with support contained in [—1/2, 1/2], we see that 
each HV> G Li(M) and has support contained in [—1/2, 1/2]. Now for each j > define 
the bounded function 

<p j (u) = (- u yi{ue [-1/2,1/2]}. 
Since this function is of bounded variation, the class 

{<f)j{{x - -)/h) : h > 0,x G IR} 



satisfies (A. 2). (See Lemma 22 of 111 IIP Thus the class /C, as defined in (12.21) is assumed 
to be pointwise measurable and satisfies (A. 2). By Lemma A.l in the Appendix, for each 
j — 0, . . . , 2p, the class 



Q- ■= {H®((x - -)/h) :b0,i6l) 



also fulfills (A. 2). Moreover, it is easily checked that each Qj is pointwise measurable. 
Hence the assumptions of Theorem 2 hold and we can infer that for each < j < 2p, and 
sequence a n satisfying 

a n \ and na n / logn — > oo, 



we have 



where 



Notice that 



sup sup 

xq&I a n <h<ho 



Hi% ) - E^(xo) 



r(i) , 



0, a.s. 



(4.3) 
(4.4) 



1 n 



h 



h Jn 



Xq — t 
h 



f(t)dt=: f*H { h 3 \x ) 



and since / is continuous on J = I n with / being a compact interval, we can use Lemma 
A.2 in the Appendix to get that as h \ 0, 



sup 

xqGI 



Effg(x ) - f(x ) f (-u) j K(u)du 



0. 



(4.5) 



Hence, it follows immediately by f|4.4|) and ()4.5J) . that uniformly in x E I and for a n < b, n 
with a n satisfying f|4.3|) and 6 n \ 0, 



sup 

a„<h<b„ 



<l(xo)-/(x ) / (-tt) J Jf(«)d« 



JR 



0, a.s. 



(4.6) 



Next consider the Hilbert space £(JR, Kd\) consisting of all the measurable functions 
<t> : M — >• 1R such that 

(p 2 (u)K(u)du < oo. 



As usual, 4>i = 4> 2 if /ir(0i — ^2) {u)K{u)du = 0; that is, each G £(IR, i^dA) repre- 
sents an equivalence class of functions. Now let 



G :-- 



u) 3+k K(u)du 



IK 



/-'■/-' 



j=0,k=0 



then G is the Gramian matrix of the set of functions {tpj : ipj(x) = (—x)- 1 , j = 0, . . . , p] 
and these functions belong to £(IR, KdX) since K has compact support. It is known that 
G is nonsingular if the functions are linearly independent. Hence, in our case, G will 
always be invertible. (Here we use K > and < f M K(u)du < 00.) To see that 



S Xo is invertible as well, recall that the function M — ► det M with M G A^ p +i(H) is 
continuous, and that by ()4.4|) and ([4.5)1 . with probability one, the components of 

converge uniformly in a; G / and a n < h < b n with 6 n \ to those of f(x )G. Hence, 
since we assume / to be strictly positive on J = I v , for n large enough, uniformly in 
xq G /, we have det A Xo > 0. Now let H p : = diag{l, h, . . . , /i p }, note that 

<S X0 HpAxgHpj 

and observe that det S XQ = h p<yP+1 ^ det A xo , so for n large enough, uniformly in x G / 
and a n < h < b n , S xo will have a positive determinant, showing that asymptotically, S xo 
is nonsingular and invertible. 

From the above it follows that with probability one, for all large n, uniformly in x G / 
and a n < h < b n , the local polynomial regression estimator of g(x ) is given by 

sS(*o) = e 1 S- 1 X t Xo W X0 Y. 

The difficulty is to determine 5" 1 explicitly, especially when p becomes large. Moreover, 
it is not possible to find a nice general formula for gn\{%o)i since the calculation of 
and g^ h (x ) becomes more complex as p increases. However, we shall see in the next 
section that ^ p ^(x ) can be easily computed forp = 0, 1, 2. 



4.2 Uniform in bandwidth consistency 



(v) 

We shall now discuss uniform in bandwidth consistency of g^ h on a compact interval /. 
Define the functions 



By Theorem 2, 



1 



1 n 



Xj — x 



K 



x - Xi 



i=i 



Xj-x 
h 



K 



h 

x-X, 



] = 0,...,2p, 
, j = 0, ...,p. 



lim sup sup max 



'nh 



fn,h,j — ^fn,h,. 



n — >OC c log n 



<h<h 



°^'^ 2p J | log | V log log 



< oo, a.s. 



n 



and by Theorem 1 with obvious identifications and K replaced by 



For j > 0, set 
and define 



IB 



[—u) J K(u)du, 



fj 0) : = Vjfx 0) , j = 0, . . . , 2p, 
Tj(x) := Hj I yf(x,y)dy, j = 0, 

>/ IR. 



,p- 



Lemma A. 2 gives (also see (J43J)) that for all < j < 2p 



Now define the function 



sup \\JEf nAj - f)\ 

a n <h<b n 



^(x) := / yf(x,y)dy, x e J, 



(4.7) 



and introduce the assumption: 

for all x E J, lim f(x',y) = f (x, y) for almost every y E JR. 



(4.8) 



Then by an argument based on the Lebesgue dominated convergence theorem, using as- 
sumptions f!2.3|) along with (12.4)1 or (12.5)1 . one readily shows that 99 is bounded and con- 
tinuous on J. Applying Lemma A. 2, we get that for all < j < p, 



sup ||IEr nA 



mi 



0. 



(4.9) 



a n <h<b n 



From these observations, we easily conclude that for all smooth functions $ : JB? P+2 — > 
iR and suitable sequences < a n < b n depending on Theorem 1 and whether ()2.4j) or 
(12.5)1 holds, with probability 1, 

SUp $ (fn,h,0, fn,h,2p, T n ,h,0, r n ,h,p) ~ $ (/o, • • • , /ap, ^0, • • • , 7p) ► 0. 

a n <h<b n v y J 

(4.10) 

When (12.4)1 is in force, we assume that a n satisfies (14.3)1 . and when (12.5)1 holds that a n = 

c(logra/n) 7 for 7 > 1. 

Calculation for p = 0. In this case we get the usual Nadaraya- Watson regression esti- 
mator: 

So applying ( 14.101) with x 2 ) = x-ijx\, we get that uniformly in x G /, 



sup 

a n <h<b n 



-(0) 



0, a.s. 



proving the uniform in bandwidth consistency of the Nadaraya- Watson estimator. 



From now on, for ease of notation we shall omit the subscripts x , as well as the 
argument (x ) in all the functions that we defined above. 



Calculation for p — 1. This is the local linear regression estimator, where S and X'WY 
are given by 



S 



nhfn, h ,o nh 2 fn A1 
nh 2 f nA1 nh 3 f nA2 



X'WY 



nhr nA0 

n,h,l 



nh 2 r. 



such that 



S l X'WY 



1 



fn,h,2 r n,h,0 ~ fn,h,l r n,h,l 
fn,h,0Tn,h,l fn,h,l^n,h,0 



fn,hflfn,h,2 fn,h,l 

Hence, the local linear estimator of the regression function is given by 

-(1) _ fn,h,2fn,h,0 — f 71^,1^71^,1 
9n,h ~ 



fn,h,ofn,h,2 fn,h,l 



So applying (14.101) with . . . , x 5 ) 



we obtain after a little algebra based 



on the definitions of f) and r J5 the uniform in bandwidth consistency of this local linear 
estimator: 

i 1 \ 

— > 0, a.s. 



sup 

a n <h< I ' 1 1 



"(1) 

9n h ~ 9 



Calculation for p = 2. As we have seen in the case p = 1, the main work in deriving 
g^ h is to determine «S -1 . Now S is a 3 x 3-matrix, so we can still write down the inverse 
without difficulties. After some calculations, we obtain (disregarding nh? factors): 



s- 1 



detS 
and 



^ fn,h,2fn,h,4 fn,h,3 fn,h,2fn,h,3 fn,h,lfn,h,4 fn,h,lfn,h,3 fn,h,2 ^ 

fn,h,2fn,h,3 fn,h,lfn,h,4 fn,h,ofn,h,4 f n h 2 fn,h,lfn,h,2 fn,h,ofn,h,3 
fn,h,lfn,h,3 fn,h,2 fn,h,lfn,h,2 fn,h,ofn,h,3 fn,h,ofn,h,2 fn,h,l 



V 



X'WY 




eventually yielding 



d (2) 
9 n ,h 



(fn,h,2fn,hA ~ fn,h,3) r n,hfi + (fn,h,2fn,h,3 ~ fn,h,lfn,h,4:)fn,h,l + (fn,h,lfn,h,3 ~ fn,h,2)^n,h;. 
fn,h,ofn,h,2fn,h,4 ~ fn,hflfn,h,3 ~ fn,h,lfn,hA ^fn,h,lfn,h,2fn,h,3 ~ fn,h,2 

So using the function 

_ (x 3 x 5 - x\)x & + (x 3 x 4 - x 2 x 5 )x 7 + {x 2 Xi - xl)x 8 

<P( Xi, . . . , Xg) — — 



X1X3X5 — X\x\ — x\x<z + 2X2X3X4 — X3 



in (14. lOL we infer after some algebra based on the definitions of fj and rj, the uniform in 
bandwidth consistency of this local quadratic regression function estimator. 



Calculation for larger p. In principle it is possible to write down an explicit formula 
for the local polynomial estimator (x ) for any p > 0, by first computing the inverse of 
S xo , multiplying it by X f W XQ Y and then by taking the first component of the resulting 
vector. But the difficulty lies in determining S~\ 

Remark. It was pointed in and QJLj] that these methods can be used to study the 
uniform in bandwidth consistency of local polynomial regression estimators. 



5 Appendix 

Let X, Xi, . . . , X n be i.i.d. from a probability space (X, A, P) with common distribution 
fx. Let Q be a pointwise measurable class of real valued functions defined on X, i.e. we 
assume that there exists a countable subclass Q of Q so that we can find for any function g 
in Q a sequence of functions {g m } in Q for which g m (x) — > g(x), x £ X. (See Example 
2.3.4, 112011 .1 Further let £±, . . . , e n be a sequence of independent Rademacher random 
variables independent of X\, . . . , X n . 

The following inequality is essentially due to 1 18] (see H). 



Inequality A.l Let Q be a pointwise measurable class of functions satisfying for some 

< M < oo 

IMIoc <M, geg, 
then for allt > we have for suitable finite constants Ai,A 2 > 0, 

P I max \\Vma m \\ g > Ai(E|| f^Mllo + *) 1 

1 Km<n — 1 



< 2(exp( y -A 2 t 2 /nal) + exp(-A 2 t/M)), 

where a 2 g = sup geg Var(g(X)). 

It enables us to reduce many problems on almost sure convergence to investigating the 
moment quantity 

n 

p n := E||^£i£pQ)|| g . 



i=l 



The following proposition proved in H] is very helpful for obtaining bounds on this 
quantity, when the class Q has a polynomial covering number. Let G be a finite valued 
measurable function satisfying for all x e X 

G(x) > sup 
geg 

and define 



N(e,G) := sup N(e^Q(G*),g,d Q ), 
Q 

where the supremum is taken over all probability measures Q on (X, A) for which < 
Q(G 2 ) < oo and dq is the L 2 ((3)-metric. As usual N(e, 9, d) is the minimal number of 
balls {g : d(g, f) < e} of d-radius e needed to cover Q. 



Proposition A.l Let Q be a pointwise measurable class of bounded functions such that 
for some constants /3, v, C > 1, a < 1/ (8C) and function G as above, the following four 
conditions hold: 

E[G 2 (X)] < /3 2 ; (A.l) 

N(e 7 Q) < Ce~ v , < e < 1; (A2) 

a 2 :=supE[(7 2 (X)] <a 2 ; (A3) 
see 



sup \\g\\oo < 



Then we have for a universal constant A 



==y/ruxyiog(l3\/l/a). (A4) 



E| | ]T £i9{Xi) I \q < Anuria 2 log(/3 V I /a). (A5) 

i=l 

Another version of Proposition A.l has been proved by 0]. For refinements, consult Jfjj] 
andd 

We shall also require the following two lemmas. The first is proved in [0]. 

Here is Lemma A.l of [5]. 

Lemma A.l Let T and Q be two classes of real valued measurable functions on X satis- 
fying 

\f(x)\<F(x), fef,xex 
where F is a finite valued measurable envelope function on X; 

IUIU<M, geG, 

where M > is a finite constant. Assume that for all p-measures Q with < Q(F 2 ) < 
oo, 



N(e^Q(F 2 ),F,d Q ) < < e < 1, 

and for all p-measures Q, 

N(eM,g,d Q )<C 2 e- U2 ,0<e<l, 

where u\, Ci, C 2 > 1 are suitable constants. Then we have for all p-measures Q, with 
Q{F 2 ) < oo, 



N(eM^Q(F 2 ),J^g,d Q ) < Cge-" 1 -* < e < 1, 
for some finite constant < C 3 < oo. 

The next lemma can be inferred from results in H 1 3L pp. 62-65]. 

Lemma A.2. Let <p be a measurable function on IR d , which for some j > is bounded 
and uniformly continuous on D 7 , where D is a closed subset of 1R d and 

£> 7 = {x e M d : \x - y\ < 7, y G D\ . 

Then for any Li(\R d ) function H, which is equal to zero for x I d 

sup \cp * H h (z) — I(H)(p(z)\ — ► 0, as h \ 0, 
zeD 

where 1(H) = J U d H(u)du and (p * H h (z) := h" 1 J-^d ip(x)H {hT 1 ^ (z — x)) dx. 
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